Robust Pseudo Feedback Estimation and HMM Passage Extraction: UIUC at TREC 2006 Genomics Track
نویسندگان
چکیده
The University of Illinois at Urbana-Champaign (UIUC) participated in TREC 2006 Genomics Track. Our focus this year was to apply two language modeling techniques for information retrieval that have been proposed recently by our group [4, 1]. These two techniques have been shown to be effective for general English text. It is not clear, though, how they perform on text in special domains such as the biomedical domain. We therefore tested their effectiveness for this year’s genomics task. First, we tried to improve the pseudo relevance feedback mechanism in the retrieval model by applying a recently proposed regularized estimation method [4]. In the KL-divergence retrieval framework, pseudo relevance feedback documents can be used to better estimate the query model [5]. While in the original proposed method [5], this estimation involved two parameters that need to be empirically set, recent work showed that a more robust, regularized estimation method that involves less parameter tuning can be effective [4]. We therefore applied this estimation method to this year’s genomics task to see whether it can improve the pseudo relevance feedback mechanism in biomedical information retrieval as well. Second, since this year’s task is defined as passage retrieval rather than document retrieval, a challenge is how to extract coherent and relevant passages from whole documents. Previously, we proposed a hidden Markov model (HMM)-based passage extraction method that was shown to be effective in the general English domain [1]. We applied this method to this year’s genomics task to see whether this method is also effective for biomedical text. Besides the two language modeling techniques, we also tested the use of user relevance feedback for retrieval, to see how much human interaction can help improve the performance. We obtained some manual judgments from two domain experts, and used them in the two interactive runs. Our experiment results showed that the regularized estimation method for pseudo relevance feedback performed similarly to the original estimation method when both methods were under the optimal parameter setting, and outperformed the original estimation method when both methods were under the default parameter setting. Because in reality we do not know the optimal parameter setting, the regularized estimation method is thus more robust than the original estimation method. Our experiment results also showed that the HMM-based passage extraction method outperformed a baseline method that returns whole paragraphs as passages. However, our HMM-based passage extraction method tends to return relatively long and coherent passages, which may not be optimal for the genomics task this year, because in this task the information need is more specific. Finally, our experiment results showed that user relevance feedback was very effective, as we expected.
منابع مشابه
Task-Specific Query Expansion (MultiText Experiments for TREC 2003)
I. INTRODUCTION For TREC 2003 the MultiText Project focused its efforts on the Genomics and Robust tracks. We also submitted passage-retrieval runs for the QA track. For the Genomics Track primary task, we used an amalgamation of retrieval and query expansion techniques, including tiering, term rewriting and pseudo-relevance feedback. For the Robust Track, we examined the impact of pseudo-relev...
متن کاملImproving the Robustness of Language Models - UIUC TREC 2003 Robust and Genomics Experiments
In this paper, we report our experiments in the TREC 2003 Genomics Track and the Robust Track. A common theme that we explored is the robustness of a basic language modeling retrieval approach. We examine several aspects of robustness, including robustness in handling different types of queries, different types of documents, and optimizing performance for difficult topics. Our basic retrieval m...
متن کاملUIUC in HARD 2004--Passage Retrieval Using HMMs
UIUC participated in the HARD track in TREC 2004 and focused on the evaluation of a new method for identifying variable-length passages using HMMs. Most existing approaches to passage retrieval rely on pre-segmentation of documents, but the optimal boundaries of a relevant passage depends on both the query and the document. Our new method aims at determining or improving the boundaries of a rel...
متن کاملLanguage Models for Genomics Information Retrieval: UIUC at TREC 2007 Genomics Track
The University of Illinois at Urbana-Champaign (UIUC) participated in TREC 2007 Genomics Track. Our general goal of participation is to apply language modelbased approaches to the genomics retrieval task and study how we may extend the standard language models to accommodate two special needs for this year’s genomics retrieval task: (1) gene synonym expansion and (2) conjunctive query interpret...
متن کاملA comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task
OBJECTIVE Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006